Planning for Data Mining Tool (PDM)
نویسندگان
چکیده
We present a tool, PDM (Planning for Data Mining), based on Automated Planning that helps users (non necessarily experts on data mining) to perform DM (Data Mining) tasks. The starting point is a definition of the DM task to be carried out and the output is a set of plans that are executed in a DM tool to obtain a set of models and statistics. Plans are data-mining knowledge flows, i.e. sequences of DM actions that should be executed over the initial datasets to obtain the final models. However, the number of feasible plans that solve the same DM task is huge making necessary to rank them by some criterion. In a first approach, the ranking is performed following some expert estimations on the desired mining-results of the DM actions. Afterwards, these estimations are improved using machine learning techniques. In order to define the DM task, we use emerging standards, such as PMML (Predictive Model Markup Language). PMML is the leading standard for statistical and DM models and supported by over 20 vendors and organizations. With PMML, it is straightforward to develop a model on one system using one application and deploy the model on another system using another application. The PMML file is automatically translated into a planning problem described in PDDL2.1. So, any state-of-the art planner can be used to generate a plan (or plans), i.e. the sequence of DM actions that should be executed over the initial dataset to obtain the final model. Each plan or knowledge flow is executed by a machine learning engine. In our case, we employ one of the most used DM tools, WEKA (Witten and Frank 2005). In WEKA, knowledge flows are described as files with a specific format, KFML, and datasets are described as ARFF (AttributeRelation File Format) files. The results of the DM process can be evaluated, and new plans may be requested to the planning system.
منابع مشابه
Improving the Execution of KDD Workflows Generated by AI Planners
PDM is a distributed architecture for automating data mining (DM) and knowledge discovery processes (KDD) based on Artificial Intelligence (AI) Planning. A user easily defines a DM task through a graphical interface specifying the dataset, the DM goals and constraints, and the operations that could be used within the DM process. Then, the tool automatically obtains all the possible models that ...
متن کاملDistributed Classification for Pocket Data Mining
Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible ...
متن کاملContributions of PDM Systems in Organizational Technical Data Management
Product Data Management (PDM) claims of producing desktop and web based systems to maintain the organizational data to increase the quality of products by improving the process of development, business process flows, change management, product structure management, project tracking and resource planning. Moreover PDM helps in reducing the cost and effort required in engineering. This paper disc...
متن کاملApplication of product data management technologies for enterprise integration
Product Data Management (PDM) systems and their offspring, Collaborative Product Development and Product Lifecycle Management technologies, aim to bring engineering enterprises together, allowing seamless interoperability between different departments and throughout the extended enterprises. However, there are a number of shortcomings in the current crop of commercially available systems, such ...
متن کاملEecient Parallel Data Mining for Association Rules
In this paper, we develop an algorithm, called PDM, to conduct parallel data mining for association rules. Consider a transaction as a collection of items, and a large item-set is a set of items such that the number of transactions containing it exceeds a pre-speciied threshold. PDM is so designed that the global set of large itemsets can be identi-ed eeciently and the amount of inter-node data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010